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Abstract 

In this paper, we propose a global method for estimating the motion of a camera 
which films a static scene. Our approach is direct, fast and robust, and deals with adja- 
cent frames of a sequence. It is based on a quadratic approximation of the deformation 
between two images, in the case of a scene with constant depth in the camera coordi- 
nate system. This condition is very restrictive but we show that provided translation 
and depth inverse variations are small enough, the error on optical flow involved by 
the approximation of depths by a constant is small. In this context, we propose a new 
model of camera motion, that allows to separate the image deformation in a similar- 
ity and a "purely" projective application, due to change of optical axis direction. This 
model leads to a quadratic approximation of image deformation that we estimate with 
an M-estimator; we can immediatly deduce camera motion parameters. 

1 Introduction 

The estimation of camera motion plays a crucial role in many domains of computer vision 
such as the recovery of scene structure, medical imaging, augmented reality and so on. This 
is a difficult task since the motion of a pixel between two images depends not only on the 
six parameters of camera motion between the two successive image captures, but also on 
the depth at the corresponding point in the static scene. Existing methods can be classified 
as features correspondences-based approaches, which are local, optical flow methods and 
direct methods, which are global. 

Among all proposed methods using features correspondences, one can mention re- 
cursive techniques based on extended Kalman filters |[T] O which track camera motion 
and estimate the structure of the scene. The essential matrix, which was first defined by 
Longuet-Higgins in O, is often estimated, as only a few correspondences in two images 
are sufficient; the number of required correspondences is discussed by Faugeras et al. in 
El 121 m . In the case of an uncalibrated camera, the analogous approach is described in Q 
with the fundamental matrix. 

The use of optical flow avoids the choice of "good" features; many authors use the basic 
bilinear constraint linking optical flow, camera velocities and depths of projected points; in 
||8l , Bruss and Horn apply an algebraic computation to remove depth from the bilinear 
constraint and use numerical optimization techniques. Heeger and Jepson, in ||9l, decouple 



the translational velocity from the rotational velocity and use linear subspace methods. Ma 
et al. in fTOl and Brooks et al. in ifTTIl use a different approach with the epipolar differential 
constraint: a differential essential matrix is determined from the optical flow, leading to 
a unique camera velocity estimation. Another well-known approach is based on motion 
parallax, notably developped by Tomasi and Shi in |[T2l . Lawn and Cipolla in |13| and 
Irani et al. in ifTSl . Tomasi et al. propose in [14J a comparison of algorithms which only 
use optical flow for estimating camera motion. 

Finally, direct methods use directly the content of a couple of images. They are gen- 
erally based on the constraint of constant illumination (also called optical flow constraint), 
that is minimized by a least square approach, on the parameters of a given motion model. 
Different assumptions are used to avoid estimating depths on all points; for example, Horn 
and Weldon in |[T6l and Bergen et al., in ifTTl , assume that the depth map is locally constant. 
In ifTSl , Negahdaripour and Horn consider that it is planar or quadratic. 

Let us notice that features correspondences-based techniques work best with well sep- 
arated views, when the displacement (especially the translation or the so-called baseline) 
between frames is sufficiently large. On the contrary, optical flow methods and direct meth- 
ods, based on infinitesimal approximations, are well-adapted to very small motions. 

Our method deals with adjacent frames of a sequence, so with narrow baselines and 
restricted camera rotations. It is a direct method, very fast and robust, based on a quadratic 
approximation of image deformation. 

The outline of the paper is as follows. In Section 2, we describe our framework. We 
recall the image deformation generated by camera motion. Then, we show that we can 
assume in the deformation formula that depth of projected points is constant (in camera 
coordinate system) under following condition: the product of the norm of translation with 
the maximal variation of inverse depth has to be sufficiently small. Thus, two consecutive 
images are linked by a planar transformation. In this context, we introduce in Section 3 
the registration group, used for modeling image deformation generated by a camera dis- 
placement. We also propose a new camera motion decomposition, that separates image 
deformation in a "purely" projective deformation, due to change of optical axis direction, 
and a similarity. As camera displacement is restricted, we obtain a quadratic approxima- 
tion of optical flow between two adjacent frames. This approximation is used in Section 
4 to define an algorithm of motion estimation; we show estimation results on synthetic se- 
quences and use motion estimations on real video sequences for mosaicing and simplified 
augmented reality. Concluding remarks are given in Section 5. 

2 Framework 

2.1 Pinhole camera model 

A camera projects a point in 3D space on a 2D image. This transformation can be described 
using the well-known pinhole camera model 1 7 ] presented in figure [T] The camera is lo- 
cated on C the optical center, and directed by k, the optical axis. The camera projects a 
point M of the 3D space on the plane 1Z : {Z = fc}. The plane 1Z is called the retinal 
plane and fc the focal length. The projection m of M is then the intersection of the optical 
ray (CM) with 71. 

Let c be the intersection of the optical axis with 1Z. If (X, F, Z) are the coordinates 
of M in the camera coordinate system (C, j, k) and (x, y) the coordinates of m in the 
orthogonal basis (c, j), the relationship between (x, y) and (X, F, Z) is following 



As fc just acts as a scaling factor on the image, we choose in this paper, without loss of 
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generality, to set the focal length to one. Then, fc will be the unit of camera and image 
coordinate systems. 




Figure 1 : Pinhole camera model. 



2.2 Camera motion 



Let D he 3. displacement of the camera or in an equivalent way a displacement of the 
plane 71. The movement D may be written in a unique way sls D = (i?, t), where is a 
rotation with axis containing C and t a translation. The set of displacements D = (i?, t) 
forms the Lie group of rigid transformations in called SE{3), which denotes the special 
Euclidian group. The displacement D = (R^t) transforms a point M belonging to 
in M' = RM + t. Thus, the camera is identified before the displacement by (C, j, k) 
and after the displacement by (C^ R{i), ^(j)? with CC^ = t. In the following, we 

denote 

(ai hi cA fti 
a2 b2 C2 and t = ^2 

as bs csj \t3y 

Let now / and g be two adjacent images in a sequence defined on rectangular domains 
K oilZ and K' of IZ' (with fc = 1). Let M be a point in R^ such that its projections m and 
m' on IZ and belong to K and K\ We denote m = (x, y) in (c, j) and = (x', y^) 
in (c^, R{i), Thus, if we make the assumption of constant illumination, we have 



f{x,y) =g{x',y'), 



and the two points are linked by 



aix + a2y + ^3 - ( 



,m) 



CiX ■ 

bix - 



- C2y + cs 

- b2y + bs - 



(1) 



and 



CiX + C2^ + C3 - ( 

aix' + biy' + ci 



^3^' + bsy' + C3 + 



C2 



Z'{x' 


y') 


t3 


Z'{x' 


y') 


t2 





(2) 



Z'{x',y') 



a^x' + b2,y' + C3 



where ?/) and y') are the depths of M respectively in (C, j, /c) and (C^ 

R{j), R{k)). 
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2.3 Depths approximation by a constant 

We now wish to approximate the depths by a constant in the two formulas ([B and Q. Let 
Zo belong to . By a Taylor expansion of equation ([T]) on -^ity) ^t>out , we obtain 



aix + a2y + as 



CiX + C2^ + C3 - 

(z(^-i) + 

CiX + C2^ + C3 - {^,R{k)) 
(z(^-i) + 



aix-\-a2y-\-a3 



-\-c2y-\-c3- 



,i?(/c)) 



6ia;+b2y+63 



(cix+C22y+C3-(^,i?(/e))) 



1 



ll^ll is small enough with respect to the image 



Thus, if for all (x^y) G K, 

coordinates, we can substitute Zq in place of y). 

We now make some numerical and technical assumptions that are little restrictive and 
so are likely verified by a couple of consecutive images. 

Hypothesis 1 - Let D = {R, t) G SE{?>) and K be the rectangular domain where f is 
defined. Let Z be the depth function of projected points, defined on K. We assume that 
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CiX + C2^ + C3 - ( 



Z{x,y) 



4 

< -. 

- 3 



Hypothesis 2 - Let D = {R, t) G SE(?>) and K be the rectangular domain where f 
is defined, having maximal dimension L. Let Z be the depth function of projected points, 
defined on K. For two matching points (x, y) and {x' ^ y') ( in the sense of formulas ([7]) and 
(121) ), we suppose that 



msix{\x' — x|, \y^ — y\} < — . 



The first hypothesis comes from the fact that the variation of optical axis direction 
and its translation along the axis k, between two consecutive acquisitions, have to be very 
small so that images were workable. The second one formulates the limitation of points 
displacements between two images; we assume that the two components of optical flow 
can not be larger than the half of image larger dimension. 

With these two assumptions, we show in Appendix lAl the following theorem. 

Theorem 1 - Let D = {R, t) G SE{?>) and K be the rectangular domain where f is 
defined, and having maximal dimension L. Let Z be the depth function of projected points, 
defined on K, bounded by Zinf > and Zgup- assume that Z and D verify hypothesis 
Uland\2\ If 



then there exists Zq > so that we can replace Z{x^y) by Zq in the equations ([7]) with an 
error bounded by e. 
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The value of Zq that minimizes e is 



Zo = argmin max 



1 



Z{x,y) Zo 



^sup ^inf 



We can also show that we can substitute the same Zq in place of Z'{x' ^ y') in equations ^ 
with an error bounded by e ^ e' if 

^ "^ll(L + l)<£^ (4) 



For small values of e and e' , conditions © and (|4]) can be verified in the following cases: 

• if there is no translation, depths do not appear in formulas ([B and ©, 

• if t 7^ 0, the scene must be far enough from the camera for verifying condition dH). 
The variations of amplitude of 1/Z must also be small enough for verifying condition 
0]): the further the scene takes place from the camera, the bigger are the authorized 
variations of depth. 

With this framework, relations ([B and Q between / and g become 

( aix ^ a2y ^ as - {% R{i)) hx ^ b2y ^ bs - {% R{j))\ . 
\cix ^ C2y ^ Cs - [t, R[k)) cix + C2y + cs - (t, R[k)) J 

and 

yasx' + bsy' + 03 + ^3 asx' + bsy' + C3 + ^3 y 

where t = In the sequel of the paper, we will assume that conditions and © are 
verified: we will use applications (p and as the relations between / and g. As we will 
consider two consecutive images in a sequence, the translation t is very small. 



3 Modelisation 

We now consider two consecutive images / and ^ in a sequence, obtained before and after 
a camera motion D = (i?, t). 

3.1 Registration group 

The applications (p and ijj are projective applications, each defined by six parameters, three 
for the rotation and three for the translation. Projective applications are classically repre- 
sented in the projective group in M?. This group is isomorphic to the special linear group 
5'L(R^) of invertible matrices. Thus, the applications (p and are associated to the follow- 
ing invertible matrices and 

fai bi ci+FA /I {lR{i)) \ 

M^=\a2 b2 C2+t2 1 {t,R{j)) \=RH (5) 

\as bs cs^tsj \0 l^{t,R{k))J 



and 



(ai a2 as - {t^, R{i))\ 
bi b2 bs - {t, R{j)) = I 1 -t2^ I = R-^H. 
Cl C2 cs-{t,R{k))J 
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Our aim is to estimate camera motion through image deformation, each defined by six 
parameters. But the projective group is an eight parameters group and the matrix decom- 
position shows that ^ in 5'L(R^). Thus we are going to model the projective 
transformation in another group, well-adapted: the registration group, introduced by Dibos 
inCa. 

Definition 1 - Let A be the subset of projective applications 

A = \^(I)'.R? so that V(x, y) e M^ 

^, , f aix ^biy ^ ci ^ a ^ ^ C2 ^ /3\ 

(pix^y) — ; , ; , 

V asx + 03?/ + C3 + 7 asx + + C3 + 7 / 

f CLi bi ci \ 
where R= b^ G 6'0(3) and (a,/?,7)GM^ \. 

\ as bs cs J 

The registration group is {A, where the composition law ^ is deduced from the compo- 
sition law o ofSE{3) through the isomorphism 

I: A — > SE{3) 

V^G^ I{(l)) = {R,t) 
where R is the rotation defined above and t = (a, 7) is the translation. 

More precisely, let 0i and 02 belong to A, they correspond to the displacements Di = 
and D2 = (i^2,^2), respectively. Then, 01^^2 = where (j) is the projective 
application associated to the displacement D = Dio D2 = {R^ t) where t is the translation 
with vector t = ti -\- Rit2 and R = R1R2. The notation Di o D2 means that the camera 
first performs the displacement Di and second D2. Moreover, if (j) belongs to A and is 
associated io D = {R^ t), then (/)~^ is associated to = {R~^^ —R~^t). 

The applications and ijj belong to A\ we have g{x^y) = f{(p{x^y)) and f{x^y) = 
g{^p{x^ y)) with = (/:?~^ in the registration group (but not in the projective group). 

By modeling the camera displacement in the registration group, we reduce the problem 
to the determination of six parameters of a planar application, as R and t are respectively 
defined by three parameters. 

3.2 Camera motion decomposition 

We propose here to decompose a camera motion in order to separate the image deformation 
in two components: a similarity part and a "purely" projective part. Indeed, any camera 
motion can be decomposed into three basic types of motion: 

• a translation, which produces an homothety translation on the image / belonging to 
the plane IZ, 

• a rotation with axis which produces a planar rotation on /, 

• a rotation with axis in the plane (C^i^j) which distorts /. 

3.2.1 Decomposition of rotation 

Let us consider a camera rotation R with axis containing C. We decompose R in two 
particular rotations i?2^i- The first one Ri, with axis A belonging to the plane (C^i^j) 
transforms the direction of the optical axis k in R{k); this rotation induces a projective 
deformation of the image /. The second one R2 is a rotation with axis R{k)\ R2 induces a 
planar rotation of the image Ri{f). Any camera rotation can be written in such a way. 
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This decomposition is interesting because of the induced deformations of the image. 
Ri produces a "purely" projective deformation of the image / whereas R2 creates a planar 
rotation of the image Ri (/). 

Let us express the rotation Ri with two parameters: for the location of A in the plane 
(C, j) and a for the angle of the rotation. If we denote R^^ the rotation matrix with axis / 
and angle a, the expression of Ri in (C, j, k) is 

D ryk ryi TDk 

^1 — riQn^n_Q 

which we denote in the following Re^oc- Now, let [5 be the angle of the rotation R2 around 
the new optical axis R{k). We can then write the rotation R2 in (C, z, j, k) 

TD TDk ryi ryk ryi ryk 

Finally, the expression of the global rotation R is 

P 7? 7? TDk TDl TDk TDk JD TDk 

n — it2iii — nQn^H^H_Q — He^cxri^. 

Thus, the rotation R may also be decomposed in a rotation around the axis k followed by 
the rotation Re^a- 

R 



k 




Figure 2: Decomposition of a camera rotation R in two rotations R2R1. 



3.2.2 Decomposition of a complete motion 

A complete camera motion D = {R, t) induces a projective deformation of the image /. 
The matrix associated to (p is RH, according to formula (O, which can now be written as 

RH = Rqq,R^H. 

If we denote re^a the "purely" projective deformation associated to the rotation Re^a and s 
the similarity associated to R^H then we have 

gi^i y) = f{^{x, y)) = f{re^oc o s{x, y)) = f o re^a o s{x, y). 
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We obtain therefore six parameters defining the camera motion, two for the rotation 
Re^a and four for the translation t and rotation i?^. We express now camera motion with 
the following parameters {9^a^ (3^ A^B^C) where {—A^—B^—C) are the coordinates of t 
in the basis {R{i)-, R{j)j R{k)). These new notations allow to obtain an easier writting of 
the projective application ijj (the inverse of (p in the registration group), which we will use 
later 

f aix ^ a2y ^ as ^ A + 62?/ + 63 + 5\ 
V cix + + C3 + 6 cix + + C3 + 6 y 

Remark that the six parameters (6>, a, f3^ A, 5, C) allow to access explicitly the camera 
displacement I) = (R^t). Indeed, 



t = -AR(i) - BR{j) - CR(k) 



R — Re,aR^-i 



3.3 Parameter values 

As we consider two successive images of a video sequence with a high frame rate (classi- 
cally 24 images per second), the camera motion between two images is very small and the 
parameter values are restricted, except for the angle which belongs to ] — tt, tt]. Let us 
remark that the dimensions of K and K' verify a practical constraint: the view angle of a 
camera is usually not larger than 150°. This means that L, the maximal dimension of K, 
must verify L < 8 /c, as the relation between the view angle a, fc and L, illustrated on 
figure [3l is 

a L 

As fc = 1, we have L < 8. 



L 

2 




L 



2 

Figure 3: Relation between the view angle a of the camera, the focal length fc and the 
maximal dimension L of images. 



Table [T] gives orders of magnitude of parameter values that we have obtained by ex- 
periment, when we take a unit focal length. These experiments consist in taking images 
and applying the six parameters projective application. As the images have not to be too 
deformed, we deduce the orders of magnitude of parameters. 



Parameter 


Values 


6 (radian) 


] - TT, tt] 


a (radian) 


[0,0.03] 


(3 (radian) 


[-0.05,0.05] 


AB 


[-0.09,0.09] 


C 


[-0.03,0.03] 



Table 1: Parameter values (A, B and C are expressed in units of focal length). 
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3.4 Optical flow approximation 



Theorem 2 - Let us consider a scene orthogonal to the axis k. Let D = (i?, t) belong to 
SE{3), also denoted D = a, A, C). Let K and K' be the domains where f and 
g are defined, with maximal dimension L, and (x, y) and {x\ y') two matching points ofK 
andK'. We assume that hypothesisUlis verified, \a\ < 1 and < 1. Then, the optical 
flow at (x, y) verifies 



( nr.' 



x' — X = —Cx + A + + ax{ii cos — X sin 0) — a sin + o{C) + o{a) + o{j3) 
+o{.J\^\) + o(v^) + o{./\AC\) + o{^W\) + 

y' — y = —Cy -\- B — l3x -\- ay{y cos — x sin 0) -\- a cos + o{C) + o{a) + o(^) 
+o(v^) + o(v^) + o(^Bq) + o{^\CjT\) + oiVWl) 

and 

\x' — X — {—Cx + A + + ax{y cos 6> — x sin 6) — a sin ^) | < T(L, a, ^5, A, C) 

\y' - y- {-Cy B - f3x ^ ay{ycosO - xsinO) + acos6>) | < T{L,a, f3, B,C) 
with 

T{L,a,f3,A,C)= [l^ ^-^ + + + Mf^ 

+L [a' (2+|/3| + ^) + M£^ + ^ + f + ^ + 



4\AC\ 
3 



The proof of this theorem is given in Appendix Thanks to the parameter values 
given in table [TJ the optical flow can be approximated by a quadratic formula in {x^y). 
Indeed, these parameter values allow to make the bound T small in comparison to the 
value of each component of optical flow. For example, in the case of a pure translation with 
A = B = 0.09 and C = 0.03, the bound T is equal to 4.2 10"^ for L = 1 and 8.4 10"^ 
for L = 8, whereas the components of optical flow have an order of magnitude of 10~^ or 
10~^. For a purely projective rotation with a = 0.01, the optical flow has an order of 10~^ 
and the bound is equal to 3 10""^ for L = 1 and 5.2 10"^ for L = 4. For L = 8, the optical 
flow has an order of 10~^ and the bound is 3.6 10~^. 

If L, a, A, 5, C are sufficiently small, the optical flow can be approximated by the 
sum of three independent terms; the component {—Cx + A^—Cy + B) is due to the 
translation of the camera, {f3y^ —f3x) to the rotation i?^ and {ax{—x^\ii6 + ycosO) — 
a sin O^ay {—x sinO -\- y cos 0) -\- a cos 0) to the rotation Re, a- These three terms are ap- 
proximations of optical flows, respectively produced by the translation, the rotations 
and Re,a' 

Remarks 

• Let us remark that at the image center, when x and y have 10~ ^ order (for a unit focal 
length), the quadratic term is negligible in comparison to the other terms. Thus, the 
deformation of the center of the image is mainly affine. 

• At the beginning of this paper, we did assume that the translation t and the depth of 
the scene have to verify 



9 





Figure 4: Decomposition of deformation. On left, a checkerboard deformed by a cam- 
era motion. On right, the deformation can be decomposed in, first, a ''purely'' projective 
deformation, generated by the rotation Re^a top) followed by a similarity (bottom). 



for substituting depths by a constant in formulas ([T]). As the approximation of optical 
flow has an order of 10~^, we must choose an approximation error e at least inferior 
to 10-2. 



3.5 Modelisation assets 

In this section, we have first proposed to work in the registration group, well-adapted to the 
projective applications (p and i/j that link two consecutive images / and g. The advantage 
of this group is the isomorphism with the Lie group SE{3), which allows to compose 
projective deformations through the composition of camera motions. 

Second, we have described a new camera motion decomposition to emphasize two 
components of image deformation: a similarity and a "purely" projective deformation, due 
to the change of optical axis direction. This decomposition is interesting because it corre- 
sponds to a physical perception of camera motion effects on consecutive images. As shown 
on figure m we easily perceive the two deformations: the "purely" projective deformation, 
which deforms parallels on the checkerboard, and the similarity, which preserves angles. 
With this decomposition, we have obtained a quadratic approximation of optical flow for 
two consecutive images, where the quadratic term is only due to the change of optical axis 
direction. Remark that we only need condition ^ for approximating equation ([T]) by 

4 Camera motion estimation 

Let / and g be two adjacent images in a video sequence. In this section, we propose a 
method for estimating camera motion between / and g, based on camera motion decom- 
position and optical flow quadratic approximation. 
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4.1 Algorithm 

Odobez and Bouthemy propose in 1201 a method for determinating 2D parametric motions 
between two images. They use constant, affine or quadratic models. Their method is robust, 
multiresolution and only uses spatial and temporal gradients of intensity. The software, de- 
velopped by the authors, is available at the address lhttp : //www . irisa . f r/Vista/Mot ion2D[ 

Let us now describe briefly their algorithm. The optical flow at a point (x, y) is assumed 
to be parametric, denoted uq{x^ y), where 6 is the set of parameters. Several models are 
proposed, the most general has 12 parameters 



C2 J yas a^J \y J \q4 J \ ^2 

The displacement frame difference (DFD) associated to a parametric motion model at the 
point (x, y) is defined with 

DFD(e,o {x, y) = g{{x, y) + u{x, y)) - f{x, y) ^ ^ 

where ^ is a global intensity shift to account for global illumination change. The set of 
parameters is thus estimated by minimizing the following function 

^ p(DFD(e,o(^,^),r) 

where the function p is called an M-estimator since its minimization corresponds to the 
maximum-likelihood estimation if p is considered as the opposite log-likelihood of the 
model. The authors choose a function bounded for high values in order to eliminate the 
contribution of outliers. They use the Tuckey's bi weight function defined as 



ti{T^-TH^^'-^) if|t|<r, 
^ otherwise. 



The minimization of p is performed using an incremental and multiresolution scheme de- 
scribed in 1201 . This method is accurate and has a low computational cost. 

Several models are proposed in the software but none corresponds to our optical flow 
approximation. Thus, we have added the following model to the software 

\C2j \-a2 CLiJ \yj \{) qi q2 

Once the six parameters (ci , C2 , ai , a2 , , ^2) are estimated, we convert them into a, /3, 6, 
A, B, C by identifying the previous expression with the quadratic formula given in theorem 

El 

— arctan(gi/g2) if ^2 > 

— arctan(gi/g2) + TT if ^2 < 
7r/2 if (72 = and(7i > 
-7r/2 if (72 = Oand(7i < 0. 




P = a2 

A = ci -\- asinO 
B = C2 — a cos 
C = -ai. 
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4.2 Results 



The performances of our method are illustrated through camera motion estimations on 
synthetic and real sequences, and some appHcations of these estimations. The context for 
applicating our method is given by condition ([3]) 

f ^ 1 A „ „ 2(L + 1) 

\ ^inf ^sup J 'J 

with e < 10 This means that for a given image size, the product of translation norm and 
variations of inverse of depth must be small enough. We do not need condition Q since 
we only use the deformation ip. 

4.2.1 Synthetic sequences 

We first estimate camera motion on sequences, that we have created from an image, con- 
sidered as orthogonal to the optical axis and deformed with sets of six parameters (6, a, P, 
A, B, C). These sets are randomly generated with respect to values given in table [T] The 
angle of view is equal to 90°. Three sequences of 200 images are synthesized; the first one 
is generated with translations, the second one with rotations and the third one with plain 
motions. The initial image is shown on figure [3 We assumed that depth is constant and 
apply formula © on the image with a bilinear interpolation. 




Figure 5: Initial image for test sequences. 





Translation 
direction 
error 


Axis rotation 
direction 
error 


Rotatio] 
err 
absolute 


1 angle 
or 
relative 


Plain 
motions 


9.7° 


17.3° 


0.03° 


2.2% 


Pure 
translations 


4.5° 




0.01° 




Pure 
rotations 




18.2° 


0.002° 


0.1% 



Table 2: Results of camera motion estimations on 3 synthetic sequences of 200 images. The 
errors are averaged errors computed over each sequence. 

Camera motion results are shown on table [2l Whatever the type of camera motion, 
the estimations of translation direction are correct up to a few degrees and the estimated 
rotation direction up to ten or twenty degrees. These last errors may seem to be important 
but we must notice that the change of optical axis direction is hard to estimate, as small 
rotation and small translation can produce very similar results on images. For example, a 
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small translation with direction i and a small rotation with axis j produce very close effects 
on images. The estimations of rotation angle are more accurate; they are correct up to a few 
hundredths degrees for rotation angles of 1 or 2 degrees. In sum, obtained results are rather 
good, better when motions are reduced to a translation or a rotation. Moreover, the scene 
was quite complicated and the method is very fast: it takes 7.7 seconds for a sequence of 
200 images with 284 x 188 pixels, with a processor Pentium M 1.8 GHz. 



Robustness Figure [6l shows the robustness of the algorithm to impulse or gaussian noise. 
We add various amounts of impulse or gaussian noise to the sequence produced with com- 
plete motions. Graphs plot errors in the estimates as a function of noise level, averaged over 
the 200 images at each noise level. For both types of noise, the errors do not increase a 
lot: they remain close to errors computed without noise, less than 15 degrees for translation 
direction, at most few tenths degrees for the angle of rotation (for impulse noise). Thus the 
method is robust, thanks to the use of M-estimator: it provides good results even when the 
amount of impulse noise is important. 



Depths influence In this paper, we have approximated the deformation (equation ([T])) 
between g and f by i/j, provided that condition ([3]) was verified, with e < 10~^ 

IMS±i2<.. 



The smaller is ( -y^ ) 



z — ; 1 1^1 1 ^'^^^^^ , the more accurate is the approximation. For a 
given scene, further the camera is from the scene, smaller is the previous expression and 
better is the estimation. This fact is illustrated with motion estimation on synthetic se- 
quences SOFAS and S0FA6 (Sequences for Optical Flow Analysis, courtesy of the Com- 
puter Vision Group, Heriot-Watt University). Each sequence, which each contains 20 im- 
ages, is given with internal and external camera parameters, and camera motion. Motions 
are basic: a translation of direction k for SOFAS and a rotation with axis k followed by a 
translation with direction k for S0FA6. Images of the two sequences are shown on figure 



[71 Results are given on tables H] and [2 the evaluation of ~ ^^^) 

computed (in units of focal length) on tabled 



2(L+1) 



is also 



inf 



inf 



2(L + 1) 



Image 1 



0.0062 



0.0076 



Image 10 



0.0112 



0.0137 



Image 20 



0.0293 



0.03S7 



Table 3: Relative variations of inverse of depths in sequences S0FA5 et S0FA6. Depths 
Zinf and Zsup, P|| cind L are expressed in units of focal length in the camera system. 



As the camera comes close the scene, differences in table [3] increase in time. Remark 
that we have L < 8; the angle of view is equal to 45°. Tables |4] and Ogive errors in motion 
estimation between consecutive images at three instants: at the beginning of the sequence, 
at the middle and at the end. The estimation method is the same as previously used: we 
assume no a priori type of motion. For SOFAS, the translation direction estimates are very 
good, better than on previous synthetic sequences. This is due to the motion simplicity and 
to the fixity of optical axis. However, we observe that when the camera comes close the 
scene, the translation estimation error and the rotation angle estimation (that should be null) 
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Estimation of translation direction 



Estimation of translation direction 





Impulse noise 



20 25 30 35 40 

Gaussian noise 



Estimation of rotation angle 



Estimation of rotation angle 





Gaussian noise 



Estimation of rotation axis direction 



Estimation of rotation axis direction 





Impulse noise 



Gaussian noise 



Figure 6: Camera motion estimation errors, averaged over 200 images of the noisy se- 
quence. Impulse noise level of 10 means that 10% of pixels values are randomly chosen 
with a uniform variable distributed on all gray levels. Gaussian noise level of 10 means 
that we add to the images a gaussian noise with standard deviation 10. 



slightly increase. For S0FA6, the translation direction estimates are always very good; but 
the estimation errors on axis and angle of rotation increase significantly when the camera 
comes close the scene. 

Although errors increase when we get close to the scene (because we then are away 
from the defined context), our method allows to conclude for simple motions (for example 
when the optical axis is fixed) even if condition ([3]) is not verified with e < 10~^. 

4.2.2 Applications on real sequences 

As we have no real sequences with given camera motion and internal camera parameters, 
we illustrate the quality of camera motion estimation with two applications of estimation 
results. 

The first use is mosaicing. In our framework, we suppose that two successive images 
are linked by a planar transformation, thus the knowledge of camera motion between these 
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m 

















Figure 7: At the top, images 1 and 2 of S0FA5 and S0FA6. At the middle, images 19 and 
20 ofSOFA5 and at the bottom, images 19 and 20 ofSOFA6. 





Translation 


Rotation 




direction 


angle 




error 


error 


Between 






images 1 and 2 


0.12° 


0.0005° 


Between 






images 10 and 11 


0.17° 


0.0018° 


Between 






images 19 and 20 


0.55° 


0.019° 


Errors 






average 


0.42° 


0.014° 



Table 4: Estimation errors on S0FA5. Camera motion is constant on the sequence: it is a 
translation of direction k (the camera comes close the scene). 



two images allows to register one image to the other. With the estimation of camera motion 
on a whole sequence, we can compute the motion between two images distant in time, by 
composing displacement estimations in the registration group. Thus, by choosing an image 
viewpoint and registering some images distant in time on it, we obtain a bigger image 
that we could observe from the image viewpoint, but with a larger vision field. Figures [8] 
and [9] show two panoramas, computed with the estimated camera motion on a real video 
sequence of an office. Remark that the mosaicing is theoretically possible if the viewpoint 
does not change (when there is no translation) or when the camera films a planar scene. 
Our movie does not exactly verify the hypothesis of pure rotation because although the 
camera translation is very small between adjacent frames, it may be significant between 
two images distant in time and obviously, the scene is not planar. But as the scene is rather 
far from the camera location, registrations are correct. 

The second use is augmented reality. It consists in adding an object in a sequence in 
such a way it appears to be present in the scene. In our framework, the application is 
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Translation 
direction 
error 


Rotation axis 
direction 
error 


Rotatio] 
err 
absolute 


1 angle 
or 
relative 


Between 
images 1 and 2 


0.23° 


0.001° 


0.051° 


2.5% 


Between 
images 10 and 11 


0.38° 


0.491° 


0.068° 


3.4% 


Between 
images 19 and 20 


0.97° 


1.08° 


0.094° 


4.7% 


Errors 
average 


0.39° 


0.269° 


0.069° 


3.4% 



Table 5: Estimation errors on S0FA6. Camera motion is constant on the sequence: it is 
a rotation of axis k followed by a translation of direction k ( the camera comes close the 
scene). 




Figure 8: At the top, scenes 20, 35 and 50 of the office sequence; at the bottom, recon- 
structed panoramic view on viewpoint 35. 



simplified since we insert in the office sequence a planar object, which is a poster. This 
poster is first inserted on the main planar region of the scene, roughly parallel to the retinal 
plane. Next, it is deformed with the projective application [6] associated to the estimated 
camera motion. Example frames from the augmented sequence are presented on figure [TOl 
This experience shows that the camera motion is accurately estimated: the poster moves 
with the same motion as the background of the scene. More precisely, the poster orientation 
follows the orientation of the background (camera rotations are correctly estimated) and its 
position is plausible. 

Let us recall that our goal is not mosaicing nor augmented reality: these two applica- 
tions are utilizations of estimated camera motions and illustrate the quality of our motion 
estimation results in our framework. 
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Figure 9: At the top, scenes 10, 30, 60, 70 and 80 of the office sequence; at the bottom, 
reconstructed panoramic view on viewpoint 60. 




Figure 10: Replacement of the notice board by a cinema poster At the top: the insertion of 
the poster on the first image. At the middle, images 10, 20, 30, 40 et 45 of the new sequence 
obtained by deforming the poster with the estimations of camera motions and pasting it in 
the sequence. 



5 Conclusion 

In this paper, we have proposed a new global method for the problem of egomotion esti- 
mation, well-adapted to adjacent frames as produced by a camera that films a static scene, 
when variations of inverse of scene depths and translation are sufficiently small. This con- 
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text is theoretically limited, but as the translation is very small between two acquisitions, 
it is not so restrictive. In this context, the method is very fast : first because we do not 
have to compute optical flow or match points as it is a direct method, second because of the 
multiresolution scheme in the software Motion2D, fitted to our quadratic approximation of 
optical flow. It is also robust, thanks to the use of an M-estimator. Moreover, the modeling 
of camera motion in the registration group allows to compose image deformations and to 
obtain camera motion between two images distant in time in a sequence. At last, as it is a 
global method, it is robust to a moving object in the scene, provided its size is limited in 
comparison to the image size. 



A Proof of theorem [D 

Let < Zinf < Zq < Zsup and (x, y) belong to K. We denote 6 = -z^ty) ~ ^^us, 
we can write formula[T] 

, ^ ul-S{t,R{i)) 
vo-S{t,R{k)) 

, _ul-5{t,R{j)) 



where 



VQ-8{t,R{k)) 



ul = aix + a2y + as - R{i)) 
ul = hix + h2y + 63 - R{j)) 



vo = cix + C2y + C3 - R{k)). 
By applying Taylor's formula on S about with integral form of remainder, we obtain 



^0 
,2 



Vo Jo 



{vo- z{t,R{k))) 
' {t,R{k))ul-{t,R{j))vo 



Vo 



voivo -S{t, R{k))) 



ivo-z{t, Rik))) 
that implies 

r {t, R{k)) 4 - {t, R{i)) vo 



^0 I ^ {t,R{k))ul-{t,R{j))vo 
Vo Vo {vo - S {t, R{k))) 



Vo ivo-5{t,Rik))) 
{t,Rik))ul-{t,Rij))vo 



< t 



< t\ 



l^ol + l^'ol 

1^0 I 

l^ol + \vo\ 
\vo\ 



1 



vo-S {t, Rik)) 
1 



Vo (vo - S {t, R{k))) -"" \vo\ vo-5{t,R{k)) 
Since {x,y) € K C [—^, ^J^, we have, with the hypothesis|2] 



< 



vo 



\vo\ + 1^0 1 
\vo\ 



< 



Vo 



y 

Vo 



Moreover, as the hypothesis [T] implies 



vo-S (t, R{k)) 



\x\+l<L + l 



|y| + l<L + l. 



4 

^3' 



thus 



max 



^0 



^0 



<S\\t\ 



4(L + 1) 
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Now, if 



-'inf ^ swp 



2(L+1) 



then, for Zn such that 4- = i ( -rr" V I , we have 



4(L + 1) 



that implies 



V(x, y) e K, max [ x' 



y 

^0 



< £. 



B Proof of theorem |2] 

Let D = (6>, a, /3, A, 5, C) be a camera motion. The rotation matrix R is equal to 

/cos/3 — (1 — cosa) sin^sin(^ — /3) — sin/3 + (1 — cos a) sin ^ cos(^ — /3) sin ^ sin a \ 

sin ;5 + (1 — cos a) cos sm{0 — /3) cos /3 — (1 — cos a) cos ^ cos(^ — ^5) — cos ^ sin a 
\ — sin a s\n{0 — (3) sin a cos(^ — /3) cos a / 



that we also denote 



R = 



ai 


hi 


ci 








as 




C3 



The coefficients of R verify, by using Taylor expansions in a and P 



ai 






= o{/3) + 


o(a), 




< /3V2 + 


«V2(1 + I/3|) 
«V2(1 + I/?I) 


a2 


= (3 -\- 9 




=om- 


f o(a), 




< /?V6 + 


as 


- —as'mO + /cag, 


kas 


= o(a2) - 


f o(V|a/3|), 




< a^/e + 


|a|(|/3|+/?V2) 
aV2(l + |/3|) 
«V2(1 + |/9|) 


hi 




hi 


= o(/?2)- 


\- o(a), 


1^61 1 


< + 

< /9V2 + 


b2 


= 1 + , 




= o{p) + 


o(a), 




bs 


= a cos 6 -\- kb^, 


hs 


= o(a2) - 


fo(v/|a/?|), 


^63 


< a^/e + 


H(I/?I+/?V2) 


Cl 


= as'mO -\- kc^ 


kci 


= o(a2)^ 




^Ci 


< |a|V6 




C2 


— —a cos + A:c2 


kc2 


= o(a2), 




^C2 


< a 3/6 




cs 


= 1 + A:c3 


kcs 


= o{a), 






< l«lV2. 





According to the definition of the application ijj, we have 



X — X = 



X + /3?/ - a sin 6> + A + o{a) + o(/3) + o( vV^) 
a sin ^ X — a cos 6> + 1 + C + o{a) 



^, ^ _ - + acos6> + 5 + o(a) + o{(3) ^o{^\a(3\) 



that is 



^ -y 



a sin ^ X — a cos ^ + 1 + C + o(a) 



(^x + - a sin 6> + A + o(a) + o(/3) + o( vlc^) 
(1 — C — a sin 6> X + a cos Oy ^ o{a) + — x 

(^y - /3x ^ acosO ^ B ^ o{a) + o(/3) + o{^/\ap\) 
{1 — C — a sin ^ x + a cos Oy -\- o{a) + 0(C)) — y. 
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That implies 

x' — X = —Cx -\- (3y — a sin -\- A — a sin 6 x'^ -\- a cos 6 xy -\- o{a) + o{(3) + o{C) 

< 

y' — y = —Cy — (3x -\- a cos 6 -\- B — asinO xy -\- a cos Oy'^ -\- o{a) + o{P) + o(C) 
Furthermore, 

\x' — X — i^—Cx ^ Py — asinO ^ A — asinO x'^ ^ a cos 6 xy) | 

— cio;^ — C2a:^y+(ai — C3 — C)x+a2y+a3+^— (cia^+C2y+C3+C)(A— Ca;+/3y+Q; cos 9xy — a sin ^o;^ — o; sin 9) 

~ cix+C2y+C3+C 

By using bounds of | , |/Ca2 1 ^ • • • ^ l^cs | and the hypothesis[Tl we get 

\x' — X — (— Py — asinO ^ A — asinO x'^ ^ a cos xy) | 
< I I x^(-ci + Cci + asinOc^ + asin6>C) - y'^f3c2-^ 

xy{—C2 + Cc2 — Pci — a cos 6>C3 — a cos 6>C) + x'^y{—cia cos ^ + sin 6)-\- 
x^{a sin Oci) — xy'^C2a cos 6 + x{ai — cs — C — Ac\ + Cia sin d + Cca + + 
7/(a2 - Ac2 + C20L sin 6> - ;^C3 - + as + A(l - C3 - C) + a sin 6>(c3 + C) | . 

As (x, ?/) G we obtain 

\x' — X — {—Cx -\- Py — a sin -\- A — a sin x'^ -\- a cos xy) | 

+L (a^ (2+1/31 + KI^) + ^ + Mca + f+2Ci + M!) 

By a similar way, we bound |^'— ^— (— — /3x -\- acosO -\- B — a sin xy -\- a cos Oy'^) | 
by replacing A with 5. 
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